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An  Optimization  Algorithm  and   its  Application   in  Queueing 


Abstract 


Finding  the  optimal  service  rate  for  a  finite  state  version  of 
M/M/l  is  examined  as  an  example  of  designing  queueing  systems.   Deri- 
vatives, concavity,  and  an  optimization  algorithm  are  developed. 


Introduction 

The  ever  increasing  number  of  papers  discussing  aspects  of  single 
channel  queueing  systems  stands  as  ample  evidence  that  in  a  very  real  sense 
stochastic  moc  als  are  always  partially  analyzed.  Complete  understanding 
of  a  queueing  system  involves  more  than  constructing  valid  equations  for 
probabilities  and  or  conditional  expectations  and  discovering  the  general 
forms  of  solutions,   It  requires  precise,  descriptions  of  how  the  system 
reacts  to  changing  parameter  values  especially  as  viewed  from  the  perspec- 
tive of  possible  measures  of  system  performance.  In  mathematical  terms 
this  means  not  only  considering  the  linear  operators  of  Markov  processes 
but  also  various  perturbations.  This  type  of  analysis  is  needed  to  es- 
tablish the  nature  of  problems  of  finding  optimal  systems.  Beyond  this 
is  the  interesting  and  eminently  practical  problem  of  efficient  computa- 
tional methods  of  finding  optimal  systems.  To  make  these  issues  precise 
and  specific,  this  paper  will  consider  designing  a  truncated  version  of 
M/M/l.  Early  work  on  the  limiting  distribution  was  used  to  produce  a 
square  root  formula  [6l,  Kumin  provided  some  general  qualitative  results 
and  some  interesting  numerical  work  [5],  Bvans  [l*]  and  Schweitzer  [7 J  have 
provided  general  perturbation  formula.  ,  Together  their  results  are  at 
best  a  beginning. 
Model 

the  problem  is  to  choose  the  service  rate,  us  for  a  finite  state  dis- 
crete time  version  of  M/X/l.  Let 

m  ■  the  maximum  number  of  customers  allowed  in  the  system 

B  =  the.  discount  factor 

g  «  the  gain  resulting  from  a  service  completion 
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G  ■  the  vector  of  one  period  expected  gains  for  the  possible  starting 
states  o  to  ra. 

G  ■  {a>  llgi  l*g»  •  •  ..,  M«) 

2      3 
h(i)  *  a  i  +•  a  i  +  a„i'  be  the  holding  cost  of  starting  a  period 

with  i  customers  in  the  system  a-,  i_,  a  non-negative. 

H  -  (h(o),  .  .  . ,  h  (m)) 

6    i 
c(|i)  ".2,0  ji,  be  the  cost  of  using  service  rate  y, 

c(u.}  is  convex  in  ^ 

X  B  the  given  arrival  rate 

V  »  V(^)  »  (V  ,  .  .  . ,  V  )  where  V  ~  o..  V.  «  pg-h(i)  «  the  expected 

O  in  O         i 

one  period  returns  from  starting  a  period  in  the  various  states. 

T  *  the  transition  matrix  with  entries. 

py      for  j  *  i-1  >  o 
1-u.    for  j  *  i  =  a 
1-X-jj,  for  o  <  j  »  i  <  m 


i 


xj       •*  1-x    for  j  •=  i  »  o 


; 


h 


j  >.      for  o  <  j  *  i   4-  1  <  en 

1  o      otherwise 

P  *  P  (jj.)  «  vector  of  state  probabilities  at  time  .i«  P  *  P  .  T  wit 
n    n  -  n    n-1 

Po  -  (1,  o,   o,  .  .  . ,  o) 

S„  *=  S  (p,)  *.S  8*P,  ~  the  discounted  expected  number  of  visits  to  the 
n    n     k**o   k. 

states  in  the  first  n  4-  I  periods. 
The  problem  is 

max  (Sw(n);  v*(&i))~c(ii) 

o  <  u  <  1-X 

where  (S,V)  is  the  scalar  product  of  S  and  V.  The  restriction  o  <  y,  <  1-X 
assures  that  T  is  a  transition  matrix.  The  finite  state  assumption  avoids 
a  stronger  requirement  such  as  u.  >  X. 

This  problem  is  sufficiently  simply  that  the  reader  may  question  whether 
current  procedures  aren't  adequate  for  its  treatment.  A  queueing  theorist 
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might  start  from  classical  transformation  and  functional  form  analysis.  These 
can  and  have  provided  some  results.   The  special  form  of  the  one  period  re- 
turns guarantees  that  (S  ,V)  depends  on  the  probability  that  the  system  is  idle 
and  the  first  three  moments  of  the  number  in  the  system.  These  quantities 
are  easily  found  from  generating  functions.  Unfortunately  if  one  attempts 
this  analysis,  especially  the  examination  of  second  derivatives  for  concavity, 
he  is  confronted  with  an  exhausting  profusion  of  terms.  Although  these  cost 
functions  are  probably  adequate  for  applications  more  general  ones  are  possi- 
ble, but  they  are  likely  to  necessitate  major  modifications  in  analysis. 
From  the  computational  point  of  view,  elaborate  formulae  are  often  much  less 
useful  than  simple  iterative  schemes.  For  these  reasons  and  others  this  work 
is  based  on  iterative  calculations. 

The  use  of  iterative  calculations  in  a  decision  making  context  does  not 
imply  that  dynamic  programming  is  the  basis  of  the  calculations.  There  are 
of  course  mathematical  similarities.   There  is  also  a  modeling  relationship. 
Any  design  problem  can  oe  considered  as  a  sequential  decision  problem  by  in- 
troducing the  possibility  of  future  redesign.  For  the  problem  of  this  paper 
it  is  reasonable  to  ask  why  not  elaborate  and  find  which  service  rate  to 
use  at  each  possible  congestion  level.  Additional  information  about  costs  of 
changing  tin      ce  rate  v- :  I  be  needed.   The  expansion  will  naturally  raise 
quest!-         aterioration  and  innovation  which  are  important  issues  in 
replacement  analysis.   It  will  also  increase  computing  costs  since  the 
fundamental  process  now  must  have  states  which  specify  both  current  congestion 
level  and  also  current  service  rate.   It  is  difficult  to  argue  against  the 
desirability  of  this  approach.  At  most  one  must  raise  the  question  of  whether 
the  costs  of  more  complete  analysis  and  more  extensive  calculations  are  justified 
by  improved  designs.  Such  issues  are  especially  important  when  one  considers 
other  kinds  of  desirable  modifications  of  the  decision  model.  Rarely  are  the 
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assumed  simple  arrival  and  service  processes  better  than  rough  first  approxi- 
mations of  the  real  system.  Perhaps  increasing  the  complexity  of  these  assump- 
tions is  more  important  than  changing  the  cost  and  decision  assumptions. 

If  one  formulated  a  sequential  decision  problem  for  choosing  service  rates 
in  this  system  and  then  res       his  c  will  have  the  problem  of 

this  paper.   First  only  stationary  policies  are  considered.  This  is  not  only 
reasonable  but  often  optimal  in  sequential  decision  processes.  The  novel 
restriction  is  that  the  same,  action  must  be  used  in  every  state.  The  inter- 
esting theory  in  dynamic  programming  analysis  of  sequential  decisions  in 
Markov  processes  is  firmly  founded  on  the  opposite  assumption.  The  choice 
of  action  in  one  state  at  some  time  does  not  prohibit  the  choice  of  any 
action  at  any  other  state  at  this  time  or  any  action  at  any  state  at  other 
times.  This  assumption  makes  it  possible  to  maximize  vector  valued  functions. 
Effective  analysis  of  the  problem  of  this  paper  requires  overcoming  the  loss 
of  this  operation  both  in  theory  and  calculation.  In  a  sense  this  work  and 
its  generalizations  offer  the  possibility  of  solving  dynamic  programming 
problems  when  the  alternative  policies  are  restricted  to  have  forms  such  as: 

use  action  A  for  states  1  to  n,  .  act!  n  B  for  states  nA  +  1  to  n_s  etc,  Even 

k'  A        B 

when  they  are  not  optimal  in  the  decision  model ,  such  policies  are  easy  to 
implement  and  may  be  opti;      one  considers  the  cost  of  using  a  policy. 

Before  proceeding  to  the  analysis  another  general  comment  on  this 
problem  and  previous  work  should  be  ms.de ,     A  number  of  authors  have  developed 
the  relation  between  linear  programming  a.nd   dynamic  programming  in  Markov 
processes >   Derraan  [2].  The  problem  posed  here  has  a  linear  programming 
equivalent.  This  is  obtained  by  considering  the  vectors  S  as  decision 
variables  as  opposed  to   p,.   Introducing  some  additional  decision  variables 
a(u,),  the  vectors  satisfy. 

sjpXi-erCi*))  -  *<u,)P 
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and  the  a(ji)  must  satisfy 

£  o?(ji)  -  I. 
M 
In  addition  Qf<^L>  >  0  S  (jj,)  >  0.  The  cost  of  ^  in  the  original  objective 

function  is  I  <y(u)  c(u).  The  scalar  product  becomes  E(S  (u).  V(u>).  The 

result  is  an   infinite  linear  program.  Even,  using  a  finite  number  of 
alternative  p,  values  does  not  produce  an  effective  solution  procedure  of 
an  approximation  of  the  original  problem.   The  obvious  decomposition  re- 
sults in  a  procedure  which  solves  for  all  S  (jj,)  .   The  effect  is  optimisa- 
tion by  enumeration.  This  is  in  sharp  contrast  to  the  efficiency  of  a 
simplex  algorithm  in  dynamic  programming  where  it  is  intimately  related 
to  Howard's  algorithm. 

Although  the  linear  recursive  equations  of  Markov  chains  must  be  and 
will  be  used,  the  optimization  aspect  of  this  paper  uses  a  nonlinear  pro- 
gramming method.  Although  the  gradient  search  procedure  is  far  from  novel, 
there  is  a  major  innovation.  At  each  stage  of  the  calculation  approximations 
are  used  for  values  of  the  objective  function  and  its  derivative.  Using 
approximations  is  certainly  eminently  reasonable  if  the  effects  of  errors 
can  be  controlled,  and  calculation  time  is  saved.  What  is  surprising  is 
that  the  crudest  possible  approximat  isful  in  all  numerical 

work,  and  special,  error  control  procedures  were  not  needed, 
Basic  Iterative  Schemes 

Even  the  simple  Markov  chains  of  this  problem  offer  a  variety  of 

alternative  iterative  calculations.   Some  of  these  are 

A)  P  =  P  ,T,P  given  B)  V  *  TV  , ,V  -  V 

n    n-1   o  '   n     n-1  o 

C)  S  «  P  +  0S  ,T,S  -  P  D)  W  «  V  +  prw  ,  ,W  =  V 

n    o     n-1  *  o    o  '   n    o      n-1'  o   o 

Let  *  denote  the  vector  or  matrix  of  derivatives.  From  the  definitions 
of  T  and  V. 
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/o  o  .  .  .  o\  /o\ 

1-1  ...  o\  /8  \ 

i     \  \p  r  -     . 

\  o  ^  x  \  • 


Using  this  notation  provides  ■     ■   • 

A')  P'  -  P'  ,  T  +  P   ,T',  P'  -  0       B")  V'  -  TV''  .  +  T'V   -,V  -  V" 
n    n-1      n-1     o  n     n-1      n-i  o 

C')  S'-«6S'  ,T  +  9S   ,T',  8'  -        D')  W'  »  V'  +  ST'W  ,  +  BTW'  ,  ,W'  -  V 
n    n-I      n-1     o  n  n-1      n-1  o 

The  second  derivatives  of  both  T  and  V  are  zero  vectors.  Thus 

A")  P"  *  2P'  t'  +  P"  T.  P"  -  0        B")  V"  -  2T'V  -  +  TV"  ,V"  *  0 

'   n      n-1      n-1  *   o  n       n-1     n-1  o 

C")  S"  -  2gS'  ,T'  +  PS",,  S"-  0         D")  W"  -  20T'W'   ,  +  8TW",  W"  -  0 
n     '  n-i      n-i   o  n         n-i      n~i  o 

All  of  these  processes  converge,  The  discounted  processes  are  the  easiest 

to  study  since  contraction  mapping  arguments  using  norms  may  he  used.  For  C 

|s    -  s    ,  !<  sis    ,  -  s    J|tI-8|s    ,  -  s    J 

n    n-1'—    n-1    n-2  !  '   •  n-i    n»2 
using  the  maximum  of  the  sums  of  absolute  values  of  the  coordinates  &g  the  vector 

norm  and  the  corresponding  matrix  norm  which  is  the  maximum  of  the  row  sums  of 

the  absolute  values.  ThuslTi85!,  For  B 

|V  -  V  .|S8|T|  |V  -  -  V  J-      .  «  V  J 

1  n    n-1    '  '  n-1    n-2  '    n-i    n-2 

when  the  vector  norm  is  the  maximum  of  the  absolute  values  of  the  coor- 
dinates.     Changing  the  o:  i   the  multiplication  leaves  the  associated 
matrix  norm  the  same  as  in  the  ise. 

C")    |S'   -  S'   .1*8 |S'   .    -  8'  ,|+e|S      ,    -  S     J|T'| 
'        n         n-1'      '   n-i         n~2!       '    n-1         n-2   ' 

and 

D')    |W'   -  W   _|<8JT||W'   .    -  W   ,|+8|T'||W      ,    -  W     J 

i  r    n-1  '—   '  f  n-1    n-21      '  n-1    n-2 

Since  both  the  S  *s  and   the  ft  ' s  coverage  geometically  with  the  suc- 
n  n 

cessive  norms  bounded  by  a  constant  times  3  »  the  general  form  of  both 
relations  is 
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n+1     n  P  p 
Either  a  goes  to  zero  as  3  or  there  is  some  n  and  l>y>8  for  which 

(y~B)a  >  (3K3  "  .  In  the  latter  case  a    <  ya  .  From  this  point  on  indue- 

tively  a  ,.  <  ^a  since  a  .  . .-  i  3a  , .  +  gK$n+^",t  <£&Y^a„  +  y^6KBn"1<  ^+xa 
^  n+j     n       n+  '  n 

Thus  a    converges  to  zero  geometrically  with  parameteo  y.  This  argument 
n+j 

ean  be  repeated  foi    "  and  D**'"  although  a  new  parameteo  y*  will  be  needed 
for  which  l>y*>y0 

Unfortunately  for  A  and  E,  JTJ^l  does  not  imply  convergence.  From 
positive  operator  theory  since  T  is  strictly  positive  for  n  sufficiently 
large,  T  converges  geometrically  to  the  projection  operator  ePM.  The 
coordinates  of  e  are  all  l's  and  it  is  the  right  eigenvector  for  eigenvalue 
1.  P^,  is  the  vector  of  limiting  probabilities  and  the  left  eigenvector  of  T 
for  eigenvalue  1.  The  algebraic  and  geometric  multiplicities  of  eigenvalue 
1  are  1.  This  means  that  the  vector  spaces  may  be  decomposed  into  one 
dimensional  subspaces  and  other  subspaces  associated  with  the  other  eigen- 
vectors which  are  all  less  than  1  in  absolute  value.  Even  using  this  decom- 
position and  restricting  T  to  the  subspace  associated  with  the  eigenvalues 
other  than  1  does  not  allow  the  easy  norm  argument  to  be  used  on  the  deri- 
vatives. What  must  be  used  is  that  the  powers  of  the  restricted  operator  TR  con- 
verge to  zero  with  entries  Tn   <  Kyn  where  y  is  the  spectral  radius  which 
is  less  than  1.  Karlin  [3]« 

First  since  T'e-0  inductively  (P',e)  =  {F^e)<i,  This  means  that  both 

n     n, 

TA   and  P"  lie  in  the  subspace  associated  with  the  eigenvalues  other  than  1. 

j,  \ 
Thus  P  r  converge  to  the  origin  geometrically.  The  iterative  scheme  may  be 

written  as 

n-1     .  k  .•   n-1 

P'«  I     P4T*T  *  I  P.TT  +  2   P.IT 

1=0  x  i*Q  x      i*fcfl  1 

For  fixed  k  the  first  term  goes  to  zero  as  n  goes  to  infinity.  For  k  large 

all  of  the  vectors  P.  differ  from  P^  by  at  most  S  in  any  coordinate.  As  n 


goes  to  Infinity  the  geometric  convergence  guarantees  that  the  sum  converges. 

In  any  coordinate  the.  sum  can  differ  frc     E   P  T  T  by  at  most  e,  K/(1-y) 

i^O  K 

for  some  constant  K  and  y   the  the  re         of  T  to  the 

subspace.  Letting  k  become  large  £v  goes  to  lly.   The  argument 

may  be  repeated  for  the  second 

For  B'  the  argument  can  be  more  direct  since  Vr_  converges  to  a  constant 

times  e.  Thus  for  n  large  T'V   .,  has  coordinates  not  greater  in  absolute  value 

°  n  - 1  ° 

than  e  e.   From  this  point  on 
n 

V .'  -  TV  '  ,   f  or  i  >  n 

i     i~l 

This  converges  geometrically  as  i  goes  to  infinity  to  a  constant  times  e. 

The  total  effect  of  the  errors  is  again  no  worse  than  e  K/(l-#)  which  goes  to 

n 

sero  geometrically.  Again  the  second  derivative  merely  requires  a  repetition 
of  the  argument. 

Not  only  do  the  iterative  processes  converge  to  limiting  values  but  the 
derivatives  of  the  limits  are  the  limits  of  the  derivatives.  F       dis- 
counted sums  of  process  C)  the  fac       T  is  analytic  in  p,  implies  that  S^ 
is  analytic.   Thus  the  derivatives  of  the  limits  must  exist.   The  derivatives 
must  satisfy 

S    (I-pT)    -   ? 

0  -    (S    (I-BT))'   -  S'(I-gT)   -f   S    (I-8T")    or  S '   =   BS   T'(I-eT)"1 
0  «   (a    f I-BT))''  -  S"(I-8T)  +  2  S"(-8T")    or  S"-   2SS  "(I-ST)"1 

CDX  '  '  CO  '  CO v  '  CO  OS  ' 

and  these  equations  are  satisfied  by  the  limits  of  the  iterative  process. 
Process  D  is  similar.   For  process  A),  P  is  analytic  in  y,«   This  can  be  veri- 
fied  directly  since  P  =  k  (1,  X/p,,  X  /y,  ,  . ..,  >m/p,m)  where  k  is  the  reciprocal 
of  the  sum  of  the  coordinates.  The  indirect  approach  uses  the  fact  that  the 
eigen  projections  of  T(p)  are  analytic  in  p,  since  T  is  analytic  in  p..  Even 
for  complex  p,,  T  has  eigenvalue  1  as  a  simple  determinant  argument  shows. 
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for  the   k  +    L  x  k  +   1  matrix  define: 

.  -  X  -  a  0 


D,  =det 
k 


i  -  A. 


o 


o 


These  determinants  satisfy: 
D  -  (1-o)(1-X-|a) 
Dk=  (l-^)(Dk^  -  XDk.2)  -  Xu-D^ 

-  (1-pror)  Dk_x  +  (1-w)  Dfc  2 

-  (1-or)  gk(or,|jt) 

where  g,(osp)  is  a  polynomial  in  af  with  coefficients  which  are  polynomials 
in  p,.  This  means  that  the  eigenvalue  1  resulting  from  the  (1-Qf)  factor  is 
associated  with  a  one  dimensional  eigen  projection.   For  p,  real,  o  <  •*  <  l->, 
positive  operator  theory  shows  that  the  projection  associated  with  eigenvalue 
1  has  a  one  dimensional  range  and  other  eigenvalues  have  absolute  values 
strictly  less  than  one.  [3,4] 
S  ince 

?  ri-T)  -  o 

CO        ' 

0  =  (Pm(I-T))'  -  P'fl-T)  +  ?  (-T')  or  P'(I-T)  =  P  T' 
and  0  =  (?  (I-T))"  -  Pl'(I-T)  +  2P'(-T')  or  P'(I-T)  -  2P'V 


The  limits  of  the  iterative  processes  also  satisfy  these  equations  but  they 
do  not  have  unique  solutions.  They  merely  determine  vectors  up  to  additions  of 
multiples  of  P  which  are  the  solutions  of  D(I-T)  -  0.   The  additional  require- 
ment that  the  coordinates  of  P  sum  to  1  means  that  the  sums  of  both  first  and 

second  derivatives  must  be  zero.   The  iterative  schemes  for  P'  and  p' 'satisfy 

n      n 

these  conditions  for  all  n  and  their  limits  must  also.  Thus  for  p,  close  to  real 
p,  the  projection  must  be  holoroorpbic  and  all  derivatives  of  the  vector  P  (u)  for 
which  |P  (p) !  ~   1  must  exist.   For  process  B)  the  general  argument  for  the 
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differentiability  of  V  is  the  same  as  the  one  for  P  .   The  functional  form 
is  g(|i)  (1,  1,  ...,  1)  for  some  one  dimensional  function  of  u.   An  additional 
condition  is  necessary  to  determine  t      it  uniquely.  This  is 

(PQ,  V*)      0,  VQ) 
which  means  that 

<?0,  vQ   .  (Pi,  vo>  +  (p,        ,  t"-vq)  4-  (po,  tV> 

and 

(P^,  Vi')  -  (P£\  VQ)  +  2(P^  Vp  4-  (Pco,  y-) 

«  (P  ,  T^'V  )  +  2<P  ,  T^'V')  4  {?  ,  TV*) 
O        O'        0       O      "  o      o 

T  'exists  since  the  rows  of  T*   are  all  P^  and  this  vector  must  have  derivatives. 

Using  T   »  (T  Tr  x)  -  T  T    +  T  I   ,  and  rv         -  T    V   4  I"   V   , 
°  o  o         o 

and   (T11?  )"     ■  Tr'  "v     4-  2T  "$*  4-  T     V  "■„   the  it@ra.tive  processes  also  satisfy 
o  o  o  o 

(P    »  V*)   -   (P   ,   Tn"*;/   )   +   (P  .,   tV) 
.on  o  o  o*  o 

and 

(P    ,   V~)    -    (P    ,    Tn"'v   )    +  2(P    ,    T^Vj    4    (P    ,    TV) 
on  o  o  o  o  o  o 

Thus  the  iterative  processes  must  converge  to  the  derivatives  of  the  limits. 
Concavity 

Two  properties  of  the  objective  function  are  crucial  to  the  optimiza- 
tion. These  are  differentiability  and  the  existance  of  a  unique  local 
optimum  which  is  the  global  optimum.  Tn&   first  has  already  been  discussed 
with  the  iterative  methods  of  calcul     t.  Although  concavity  is  a 
stronger  property  than  needed  to  guarantee  that  derivative  equals  zero 
identifies  the  op  .  it  is  easier  to  work  with  than  some  form  of 

unimodality.  The  reason  is  that  concavity  is  preserved  under  addition. 
Thus  the  gain,  the  holding  cost  and  the  cost  of  the  service  rate  can  be 
treated  separately. 

The  discounted  expected  gain  is  perhaps  most  easily  studied  through 

W  =G  +  3TW  . ,  «  =0 
n  n-1   o 

The  sequences  {W  }s  {W}  and  {w'"}  have  the  following  properties 


I)   W  *  0 
n 


2)  0  ^  T"  W  *  -  G' 

n 

3)  T"2  W  «  -  T*  G 

n 

4)  W  £  0 

n  " 

5)  T'  W  £  0 

n 

6)  W"  £  0 


a 


] 


The  proof  is  inductive,  All  results  are  obvious  for  W  as1W  =  W'*''«0. 
r  o    o    o 


Assuming  them  true  for  n-1 


W  »  G  +  3  T  W  , 
n  n-1 


is  nonnegative  since  T  has  nonnegative  elements,   Next 


By  direct  calculation 


T'W  -  x*  G  +  S  T"  T  W 


a 


n-1 


where 


X*  « 


T"  T  —  T*  T' 


0 


1-A-jj   X 


o 


y     \ 

V 


V, 


y  1-X-p   | 

The  matrix  I*  is  nonnegative  with  row  sums  not  exceeding  1.   It  agrees  with 

T  except  in  the  first,  second  and  last  rows.   Since  T"  W  ,  ■£  0,  T"*  T  W  ,  * 

n-1  n-1 


T*  T'  W  .  ^  0.  Adding  T*  G  £  0  means  that  T"  W  ^  0.   Since  T-*  W  ,  * 
n-i  n  n~i 


T  T  W 


n- 


1*  T  W  1  £  T*  C-G")  *  (0,  -  (l-U)g,  -g, 


n-1 


g) .   The  last 


inequality  uses  the  specific  forms  and  permits  the  addition  of  T**  G  =  (0,  -ygj 

2 
0,  .  .  * »  0)  giving   '    >  -C.  For  T"   W  ,  the  first  term  T-**  G  *  -T  G  . 

>2 
The  T^  T  product  can  he  rewritten  as 

/  0  0 


'2  **   ^2 

rr   *"*   rn   — .  rsi  m   *-■ 


where  Q  is  a  nonnegative  matrix  with  row  sums  not  exceeding  1.  Thus 
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/ 


T'2Wn   -   T'2  G+    CQT'24/  )      Tf2l  Wn-1 


0   -   X    .    . 


Under  the  induction  hypothesis  T'  VI ^    .   <   -T'G'  ■  (0,  g,  0}  .  .  . ,  0). 

Since  T'2G  =  (0,  ug,  -  ug,  0,  .  .  .  ,  0) ,  QT'2  W  .  +  T'2  G  <  (0,  g,  0,  ...,  0) 

The  remaining  term  contributes  -  X  (W  , ,    -  W  , ,    , )  to  the  last  coordi- 

n- I     m  1 ,   m- L 

nate,    but   this   term  must  be  negative   since  T%?     ,    <  0..     Next 

n-1 

W'  -  G'  -I-  T*  W   ,  +  V     >  0 


r; 


n~  i 


since  T'  W   ,  >  -G'  and  T  W'  ,  ">  0  because  T  >  0  and  W'   ,  >  0. 

n-1  n-i  n-i 

Property  5  requires 

TV  -  T'G'  +  T'W   ,  +  T'  T  W'  ,  <  0 
n  n-1         n-1 

Again  rewrite  T'  T  as  T   T".  The  induction  hypothesis  T'  W   ,  <  0 

n-1 

and  the  nonnegativity  of  l"  imply  that  T'  T  W  ,  <  0.   The  induction 

J  n-1 

hypothesis  also  imples  Tw'  W     ,    <  -T"G"  -which  at  most  cancels  the  Tl 

n~"-t 

term.  Thus  T4*  W'  <  0,  Finally  the  important  property  W""  <  0 

W'  -  2T"*  V*  ,  4-  T  W'  . 
a       a~l     n-1 

The  nonnegativity  of  T  and  W'  4?  0  means  that  the  second  term  is  nonpcsitive 
and  the  first  term  is  nonpositive  by  the.  induction  hypothesis. 

The  holding  cost  term  is  not  a  analyze  as  one  might  expect. 

Consider 

W  =  0 

o 


W  -  H       W  , 
n  n-l 


Since 


T'  W  «T'H+3T'TW  ,  *  T'  H  *  3  (I  +  y  T"  +  X  T   ;  T  W  , 
n  n-1  n-1 

where  I  +  nl'+^T%0,  and  T "  H  £   0."  inductively  T'  W  <:  0.   This 

n 


means  that 


W  -  0 
o 

W'=gT'W  ,+TW', 
n        n-1      n~l 
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must  produce  a  sequence  of  vectors  W  which  are  nonpositive.   This  is  the 

n 

expected  result  that  increasing  the  service  probability  will  decrease  the 

sum  of  the  discounted  expected  holding  cost  regardless  of  the  starting 

state.   For  the  second  derivatives  one  must  consider 

W"  »  8  2T'  V?"  ,  +  8  T  W", 
n  &-1       n-1 

^2 
If  T  W  is  always  notmegative  then       !.   For  this  tc  be  true  T  '  W 

xi  s.^  n^x 

»«    *      *  -2 

+  <T   T  W  ,  must  be  nonnag.-.       The  problem  is  that  T  '  W  ,  need  not 
n—1  n-1 

remain  nonnagativa  even  though  T      H  ^  G,   One  exception  is  the  two  state 

2  2 

case  in  which  T   -  -T  and  thus  T"  W  ■»  -T*  W  >  0.  General lv  the 

n       n  ™ 

,2  Z 

attempt  to  reverse  T   and  I  produces  t**  X"*' 

The  only  problem  £g  the  -X  terms  of  the  last  row  of  the  T  ■'.  These  are  indicative 

of  the  fact  that  ¥  may  not  remain  convex  in  the  state  variable.  Clearly 

n 

this  is  a   difficulty  caused  by  the  upper  boundary  of  the  finite  state 

assumption. 

A  simple  example  shows  that  the  holding  cost  term,  is  not  always 

convex.   This  Is  the  three  state  system  with  u  «  0,  A  ~   I  and  H  ffi  (0,  15  2). 

Table  I  shows  what  happens  in  Iterating  W  ~  T  W  ,  ,  W  ■=  T  W  -  + 

is  n™"l *      n  n~i 

T-*W     .  ,    and  W*"*-   2Tr   W-*    ,    +  T  vC '  ,    start ;  i     ■  R. 

n-1  n  n-1  n   "  o 

xaole   x 


n 

ff 

W 

w 

(P    ,   W      ) 

n 

■': 

n 

o       n 

0 

(0,    1,    2) 

(  o,    os 

0) 

(   0,      04 

0) 

0 

1 

(1,    2,    2) 

(  0,   -1, 

"I) 

(  os     0, 

0) 

0 

2 

(2,    2,    2) 

\~i-  s     ~^J  j 

-1) 

(   0,      2, 

0) 

0 

3 

(2,    2,    2) 

-1) 

(  2,      2, 

-2) 

2 

4 

(2,    29    2) 

V"°*l.  ^     ""A  s 

-I) 

(  2,    -4, 

-2) 

2 

5 

V_  Z  ^     £.  %     &* ) 

\""*"X  ;      **Xj 

-I) 

(-4,   -2, 

-2) 

-4 

6 

(2,    2,   2) 

\"""J-  *      ™"i-  js 

-1) 

.  (~2t   -2, 

-2} 

-2 
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Although  y  «  0,  X  «*  1  is  an  extreme  point,  the  continuity  of  the  3econd 
derivative  guarantees  that  this  is  the  behavior  for  y  close  to  0,  X  close 

to  1.  Even  here-,  if  3  is  sufficiently  small,  US'  (?  >  ¥  )  will  be  convex 

on 

in  y, 
Algorithms 

For  sequential  decision  est  presents  a  serious 

problem,  The  holding  cost  may  be  eliminated  and  the  relatively  simple 
convexity  argument  used  to  establish  a  simple,  form  lor;   the  optimal  y  as 
a  function  of  the  state  variable*      Alternatively  one  may  seek  a  more 
sophisticated  analysis  perhaps  using  other  restrictive  assumptions  or 
show  that  the  optimal  policy  is  a  member  of  some  more  complex  class  of 
policies.   The  effect  of  non^convexity  on  this  analysis  of  the  design 
problem  is  not  as  disastrous.   The  reason  is  that  computational  procedures 
for  finding  optimal  designs  not  characteristics  of  optimal  designs  are 
sought. 

The  procedure  which  has  solved  a  number  of  examples  is  the  following 
iterative  calculation; 


s     ■ 
o 

P 
o 

o 

0 

s     » 
n 

P       + 
0 

B  T  (y  )   S 
s 

S"  ■ 
n 

6  T" 

n-I                    n 

/V  +  h     if  (S  ,  V')  +  (S*,  V(u  ))-c'(y)  *  0 

(y  -  A  if  (s  ,  V)  +  (S%  v(y  ))-c'(y)  <  0 
Vjn    n     n         n     n 


'1/2  &  if  u  .-  «  y 

' A      otherwise 
n 


Stop  if  <&_  V")  +  (S',  V  (y  ))-c*(y)  is  approximately  0 

xx.  n 

and  S  -  S  ,  and  S^  -  S-*  -  are  approximately  0 
n    n-1      n    n~l      * r        ' 


Although  this  procedure  is  iterative,  it  is  quite  different  from  a 

dynamic  programming  calculation.  At  each  stage  of  the  calculations  all 

periods  of  a  finite  horizon  are  affected.  The  one  period  values  for  all 

states  at  all  times  are  changed  from  those  of  V(y  ,)  to  the  values  V(y  ). 

The  term  c(y)  is  also  changed  as  the  y  values  change.  This  term  was  not 

modified  for  the  finite  horizon.  This  could  have  been  accomplished  by 

using  c**(y)  a  (1-3)  c(y)  as  a  one  period  cost  of  the  service  rate  y. 

The  iteration  would  then  use  service  rate  cost  of  c*(y  )  ~   c**(y  )  +  $ 

n        n 

(c**(y  )/  c**(y  ,))  c*(y  -) .  The  probability  modification  is  different 


also.     Using  T  (y  )  -  I 


ti-1     .     j 

Z     B3     ir     (T(y  ) 

«*o         i^o  u—a— x 


becomes 

B  "?  B3    I    I  <Vl-i>  T  (V  +  I 

i»o    1*0 
In  other  words  the  new  transition  matrix  is  the  last  transition  in  the 

sequence  leading  to  each  of  the  periods  represented  in  S  . 

Convergence  is  obviously  a  crucial  issue.  All  examples  tested  so 

far  have  had  convex  objective  functions .   Convergence  has  paralleled 

the  convergence  of  S  and  S'  for  y  held  constant  at  the  optimal  value. 

As  with  standard  dynamic  programming  algorithms,  convergence  was  sensitive 

to  the  discount  factor.  From  the  logical  point  of  view,  the  stopping 

rule  guarantees  that  the  process  only  terminates  in  a  local  maximum.   In 

the  examples  the  initial  value  of  the  increment  in  y  was  small  compared 

to  the  interval  0  to  1-A.  This  meant  that  a  number  of  iterations  were 

required  to  move  to  the  optimum,  but  these  would  have  been  required  for 

an  approximation  of  the  infinite  horizon  objective  function  anyway.  The 

number  of  reversals  of  direction  did  not  exceed  5%  of  the  number  of 
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iterations,  but  the  points  at  which  the  reversals  occurred  did  not  serve 

as  brackets  for  the  value  which  maximized  the  objective  function.  As 

far  as  n  period  approximations  of  rhe  objective  function  are  concerned, 

these  also  are  not  mono tonic.   If  convergence  can  be  shown  it  will  not 

be  based  on  a  simple  monotonic  argument. 

The  original  concept  of  the  algorithm  was  one  of  using  a  sequence 

of  approximations  of  the  objective  function  which  can  be  guaranteed  to 

become  arbitrarily  good.   In  this  circumstance  if  this  gradient  type 

algorithm  would  work  on  the  objective  function  and  its  true  derivative, 

then  it  would  work  after  the  approximations  became  sufficiently  good. 

This  concept  uses  repeated  ran  iterations  of  the  approximation  at  the 

n  th  stage  where  the  m  become  arbitrarily  large  as  n  goes  to  infinity. 

This  process  involves 

m  , 

S°   -  P  +  6  T(y  )  S  V" 
n      o       n   n-1 

m  .,        m  n 

ar°  =  3  t-  s  n:1  +  a  t  s  n:1 

n  n-1        n-1 

Sk'  -  p  +  B  f  (p  )  sk~l 
r>      o        *n   n 

s,k  m  B   r  gk-l  +  g  T  (  )   g,k-l 
n  n  n   n 

It  was  not  expected  that  m  *  1  for  all  n  would  produce  such  great  success 

as  has  been  experienced.  On  the  other  hand  so  far  no  convergence  proof 

for  this  or  similar  algorithms  have  been  found. 

Approximately  optimal  dynamic  programming 

The  difference  between  design  and  control  problems  and  the  difference 

between  this  algorithm  and  dynamic  programming  have  been  emphasized  so 

that  the  reader  will  not  jump  to  incorrect  conclusions.   It  is  now 

appropriate  to  point  out  that  the  procedure  suggested  here  can  solve 
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dynamic  programming  problems  if  the  optimal  policy  is  stationary  and  the 
decision  variable  in  each  state  is  continuous.   For  several  decision 
variables,  the  single  derivative  of  this  algorithm  mnst  be  replaced  by  the 
appropriate  numbers  of  partial  derivatives.   This  increases  the  magnitude 
of  calculation  in  each  iteration.   The  stopping  rule  becomes  gradient 
equal  to  zero  as  well  as  appropriate  convergence.   Changes  in  decision 
variables  are  made  by  adding  change  vectors  in  the  estimated  gradient 
direction  to  current  decision  variable  values.   For  dynamic  programming 
there  is  one  decision  variable  for  each  state.   For  each  state  the  one 
period  expected  values  of  the  starting  state  and  decision  variables  as 
well  as  transition  probabilities  must  depend  on  one  decision  variable. 
The  decision  variable  for  state  k  cannot  affect  these  functions  for  any 
state  j  4   k.   In  this  example  the  service  rate  to  be  used  in  each  state 
could  be  the  decision  variable  for  each  state  in  a  dynamic  programming 
problem  for  the  optimal  control  of  service.  Restricting  the  decision  to 
choosing  one  service  rate  for  states  ^  k  and  another  possibly  different 
rate  for  states  greater  than  k  is  not  possible  in  dynamic  programming 
algorithms.  The  algorithm  of  thi  paper  can  solve  ejch  problems  for 
optimal  stationary  policies  for  given  initial  conditions.  Moreover  the 
fewer  decision  variables  the  less  the  calculation  cost.  Thus  if  one 
can  decide  that  a  policy  of  use  y5lfor  states  £   k  and  u_  >  k  might  be 
approximately  optimal  in  some  sense,  this  algorithm  can  find  the  best 
such  policy.  This  may  provide  useful  answers  to  practical  problems  where 
analysts  have  hesitated  to  try  dynamic  programming  because  of  calculation 
costs  and  the  possibility  of  policies  which  are  too  complicated  to  be 
implemented. 
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